Nagios: Network Monitoring on the Cheap

Mark Leighton Fisher on 2006-01-27T19:11:55

Network monitoring is like doing your income tax early – a practice more often honored than observed in real life. It doesn't help that commercial solutions cost 1-2 orders of magnitude more than Windows or Office (the software packages most PC users work with). Even large businesses like their software budgets to buy more than one item per department for the entire year.

Nagios is an Open Source network monitoring system where all of the code that interacts with the network is implemented as plug-ins. Many of these plug-ins accept a large number of options that can be set in the Nagios configuration files. This gives you tremendous flexibility in configuring your network monitoring, particularly because of how easy it is to write a basic plug-in (including in Perl, as you might expect).

Hosts in Nagios usually correspond to a single IP address, although DNS names can be used for hosts at the risk of totally losing track of a host's status when DNS goes down (as it did to me this week). Hostgroups are named groups of hosts (as you would expect). Each host is expected to be a member of one or more hostgroups.

Services are what the ordinary user sees when they access a host – HTTP, SMTP, SSH, etc. Services are what people normally keep track of in Nagios. Behind the scenes, Nagios uses ping (by default) to track whether hosts are up or down.

Nagios comes with a whole host of plug-ins for various services – not only the more common base services like ping, SMTP, HTTP, HTTPS, FTP, SNMP, etc. but also Oracle, PostgreSQL, FlexLM, etc. There are also plug-ins for tracking system-local information like disk space, #processes, etc. as well as a few plug-ins to track system-local information remotely. Plug-ins are relatively easy to write – if I recall correctly, I once wrote a specialty plugin for tracking whether a database-driven website was functioning properly in a couple of hours with Perl (no surprises there...)

The "people" Nagios knows about are known as contacts, which have properties like names, email addresses, pager numbers, etc. Contacts are grouped into contactgroups, which helps in larger installations as the Linux server admins may not be the Windows server admins who probably are not the mainframe admins. Also, contactgroups are who will be sent an alert when a service or host has a problem.

Alerts can be sent by email or pager out of the box. Alerts are implemented by external commands, so you could wire up an X10 driven by Nagios to a big flashing light in the Engineering Department if they requested.

Once you learn the basic format of Nagios configuration files, it is relatively easy to set up. I once set up 12 services on 3 computers in an hour.

One advanced feature is dependencies, which allows you to specify which systems depend on each other. I have not needed this, but someone running a commercial Internet site might find it handy (think "lots and lots of routers along the way...").

Although there is a lot more to Nagios than I have covered here, Nagios is pretty simple to set up considering the amount of power and flexibility Nagios provides. (Yes, I am sold on Nagios.) As a part-time system administrator (my full-time gigs have been mainly software engineering work for 19 years now), the biggest share of my Nagios setup time has just been getting the names and IP addresses of the systems I need to monitor – Nagios configuration has been quick once I had all that information together.

Even if you are a developer whose applications just consume distributed computing resources (like an electronic CAD app that runs on HP-UX, accesses Oracle on AIX and SAP on a mainframe, with clients on Windows PCs) I think (IMHO) that you will find learning about Nagios to be time well spent.